Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models
نویسندگان
چکیده
The advent of transformer-based models such as BERT has led to the rise neural ranking models. These have improved effectiveness retrieval systems well beyond that lexical term matching BM25. While monolingual tasks benefited from large-scale training collections MS MARCO and advances in architectures, cross-language fallen behind these advancements. This paper introduces ColBERT-X, a generalization ColBERT multi-representation dense model uses XLM-RoBERTa (XLM-R) encoder support information (CLIR). ColBERT-X can be trained two ways. In zero-shot training, system is on English collection, relying XLM-R for mappings. translate-train, queries coupled with machine translations associated passages. Results ad hoc document several languages demonstrate substantial statistically significant improvements over traditional CLIR baselines.
منابع مشابه
Alternative Approaches for Cross-Language Text Retrieval
The explosive growth of the Internet and other sources of networked information have made automatic me diation of access to networked information sources an increasingly important problem Much of this informa tion is expressed as electronic text and it is becoming practical to automatically convert some printed docu ments and recorded speech to electronic text as well Thus automated systems cap...
متن کاملDifferent approaches to Cross Language Information Retrieval
This paper describes two experiments in the domain of Cross Language Information Retrieval. Our basic approach is to translate queries word by word using machine readable dictionaries. The first experiment compared different strategies to deal with word sense ambiguity: i) keeping all translations and integrate translation probabilities in the model, ii) a single translation is selected on the ...
متن کاملAssessing Wikipedia-Based Cross-Language Retrieval Models
mir durch ihre Hilfe bei den maschinellen¨Ubersetzungen viel Zeit gespart.
متن کاملCross-language Transfer of Multilingual Phoneme Models
We present a method to use speech data from multiple languages to enhance the performance of a flexible vocabulary command word recognizer which is trained using a small amount of speech data of the target language. We develop data-driven approaches for identification of multilingual phoneme units and mapping of these units to the target language phonemes, and evaluate them against the knowledg...
متن کاملAn automated linguistic knowledge-based cross-language transfer method for building acoustic models for a language without native training data
In this paper we describe an automated, linguistic knowledgebased method for building acoustic models for a target language for which there is no native training data. The method assumes availability of well-trained acoustic models for a number of existing source languages. It employs statistically derived phonetic and phonological distance metrics, particularly a combined phonetic-phonological...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-99736-6_26